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ABSTRACT 

The  purpose  of  this  paper  is  to  present  a  multi-level  operational  C2  holonic  reference 
architecture  that  is  applicable  to  Navy  maritime  headquarters  (MHQ)  with  maritime 
operations  center  (MOC)  for  assessing,  planning  and  executing  multiple  missions  and 
tasks  across  a  range  of  military  operations.  The  control  architecture  consists  of  three 
levels:  strategic  level  control  (SLC),  operational  level  control  (OLC)  and  tactical  level 
control  (TLC).  In  addition  to  coordination  within  each  level,  two  specific  coordination 
layers  are  identified  at  the  SLC-OLC  and  the  OLC-TLC  interfaces.  The  SLC-OLC 
interface  layer  resolves  evaluation  issues  associate  with  selecting  DIME  (diplomatic, 
information,  military  and  economic)  actions  based  on  national  priorities,  while  the  OLC- 
TLC  interface  layer  is  used  to  resolve  mission  monitoring/planning  issues  associated  with 
deciding  on  the  courses  of  action  based  on  outcomes  of  asset-task  allocation  at  the 
tactical  level.  We  employ  semi-Markov  decision  process  (SMDP)  approach  to  decide  on 
missions  to  be  executed  and  their  time  sequence  at  the  SLC-OLC  layer  (coordination  of 
future  plans),  while  a  distributed  SMDP  approach  to  an  action-goal  attainment  (AGA) 
graph  for  addressing  the  mission  monitoring/  planning  issues  related  to  task  sequencing 
and  asset  allocation  at  the  OLC-TLC  layer  (coordination  of  future  operations  and  current 
operations).  The  times  between  decision  epochs  at  the  SLC-OLC  layer  are  determined  by 
the  mission  completion  times  at  the  OLC-TLC  layer,  while  the  DIME  actions  and 
missions  to  be  planned  at  the  OLC-TLC  layer  are  determined  at  the  SLC-OLC  layer. 

Keywords:  holonic  reference  architecture  (HRA),  maritime  headquarters  (MHQ), 
maritime  operations  centers  (MOC),  strategic  level  control  (SLC),  operational  level 
control  (OLC)  and  tactical  level  control  (TLC),  semi-Markov  decision  process  (SMDP), 
action-goal  attainment  (AGA)  graph,  diplomatic,  information,  military  and  economic 
(DIME)  actions 


I.  INTRODUCTION 

Motivation 

The  term  maritime  headquarters  refers  generically  to  those  Navy  operational-level 
commands  with  the  capability  to  assess,  plan,  and  execute  at  the  operational  level  of  war 
and  the  term  is  inclusive  of  the  commander,  the  staff  and  the  facilities  [1].  The  Navy’s 
new  concept  of  incorporating  MHQ  with  MOC  emphasizes  standardized  processes  and 
methods,  centralized  assessment  and  guidance,  networked  distributed  planning 
capabilities,  and  decentralized  execution  for  assessing,  planning  and  executing  missions 
across  a  range  of  military  operations.  The  assessment  is  a  continuous  process,  whose 
primary  purpose  is  to  provide  the  commander  (CDR)  with  a  comprehensive  report  of 
progress  made  with  regard  to  the  achievement  of  maritime  objectives.  This  overall 
objective  assessment  combines  the  monitored  outcomes  of  mission  execution  and  the 
analyzed  effects  of  operations  (diplomatic,  information,  military  or  economic),  with  the 
situational  awareness  to  inform  future  development  of  plans,  prioritize  ISR  activities  and 
allocate  forces.  The  maritime  planning  process  contributes  to  the  development  of  the 
CDR’s  guidance,  an  executable  plan  and  orders  to  tactical  forces.  The  planning  process 
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is  informed  by  guidance  from  higher  headquarters  and  the  assessment  process,  and 
should  be  highly  collaborative  both  vertically,  with  higher  headquarters  and  subordinates, 
and  horizontally,  with  other  MOCs  and  joint  components.  The  maritime  planning 
processes  focus  on  the  desired  objectives  and  operational  effects  required  by  higher 
headquarters  guidance.  In  execution,  the  CDR  will  command  by  directing,  monitoring, 
assessing  and  re-directing  forces.  The  primary  tool  for  the  MOC  will  be  the  collaborative 
information  environment  (CIE).  Important  to  effective  execution  is  operational 
environment  awareness,  horizontal  and  vertical  integration  with  other  commands  and 
continuous  assessment. 

In  this  paper,  we  model  the  coordination  issues  inherent  in  the  MHQ  with  MOC  via  a 
three-level  architecture  that  links  tactical,  operational  and  strategic  levels  of  decision 
making.  Here,  we  seek  to  demonstrate  that  the  C2  coordination  issues  at  the  three  levels, 
viz.,  strategic,  operational  and  tactical  levels,  associated  with  DIME  actions  (future  plans), 
and  mission  planning  (future  operations  and  current  operations)  can  be  modeled  and 
addressed  by  using  the  proposed  architecture. 

Related  research  and  new  contributions 

Our  previous  research  on  C  organizational  design  has  included  the  modeling  and 
synthesis  of  organizational  structures  at  the  tactical  level  to  achieve  a  set  of  command 
objectives,  such  as  maximizing  the  speed  of  command,  minimizing  coordination, 
balancing  workload,  and  so  on.  Levchuk  et  al  [2-4]  developed  the  following  three-phase 
process  to  design  mission-congruent  organizations: 

Phase  I:  The  first  phase  of  the  design  process  determines  the  task-asset  allocation  and 
task  sequencing  that  optimizes  mission  objectives  (e.g.,  mission  completion  time, 
accuracy,  workload,  asset  utilization,  asset  coordination,  etc.),  taking  into  account  task 
precedence  constraints  and  synchronization  delays,  task-resource  requirements,  resource 
capabilities,  as  well  as  geographical  and  other  task  transition  constraints.  The  generated 
task-asset  allocation  schedule  specifies  the  workload  of  each  asset.  In  addition,  for  every 
mission  task,  the  first  phase  of  the  algorithm  delineates  a  set  of  non-redundant  asset 
packages  capable  of  jointly  processing  a  task.  This  information  is  later  used  for  iterative 
refinement  of  the  design,  and,  if  necessary,  for  on-line  strategy  adaptation. 

Phase  II:  The  second  phase  of  the  design  process  combines  assets  into  nonintersecting 
groups,  to  match  the  operational  expertise  and  workload  threshold  constraints  on 
available  DMs,  and  assigns  each  group  to  an  individual  DM  to  define  the  DM-asset 
allocation.  Thus,  the  second  phase  delineates  the  DM-asset-task  allocation  schedule  and, 
consequently,  the  individual  operational  workload  of  each  DM. 

Phase  III:  Finally,  Phase  III  of  the  design  process  completes  the  design  by  specifying  a 
communication  structure  and  a  decision  hierarchy  to  optimize  the  responsibility 
distribution  and  inter-DM  control  coordination,  as  well  as  to  balance  the  control 
workload  among  DMs  according  to  their  expertise  constraints. 
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Each  phase  of  the  algorithm  provides,  if  necessary,  feedback  to  the  previous  stages  to 
iteratively  modify  the  task-asset  allocation  and  DM-asset-task  schedule.  Phase  I  of  the 
design  process  essentially  performs  mission  planning,  while  Phases  II  and  III  construct 
the  organization  to  match  the  devised  courses  of  action. 


“  Q3 
3  !=*■- 
7K  Q 


Figure  1 .  Three-phase  organizational  design  process. 

C2  architecture  can  be  organized  as  a  hierarchy,  heterarchy,  or  a  holarchy.  A  hybrid 
organizational  structure,  termed  the  holonic  structure  or  the  holarchy,  is  proposed  in  order 
to  overcome  the  drawbacks  of  both  the  hierarchy  and  the  heterarchy.  The  term  ‘holonic’ 
is  derived  from  the  word  ‘ho Ion’,  and  was  introduced  by  Koestler  in  the  context  of  social 
and  living  organisms  [8].  This  word  is  a  combination  of  the  Greek  ‘holos’  meaning 
whole,  with  the  suffix  ‘on’  which,  as  in  proton  or  neutron,  suggests  a  particle  or  part. 
The  holon,  then,  implies  a  combination  of  ‘wholes’  and  ‘parts’.  Thus,  ‘holons’  refer  to 
autonomous  self-reliant  units  (“cells”),  which  hold  a  degree  of  independence  and  are  able 
to  manage  local  contingencies  without  interference  from  their  superiors. 

The  holonic  structure  combines  the  best  features  of  hierarchical  and  heterarchical 
structures,  and  addresses  key  requirements  of  C  organizational  structures  operating  in 
dynamic  and  uncertain  environments.  It  is  a  hierarchy  of  self-regulating  holons  ability  to 
model  and  control  very  complex  systems,  high  resilience  to  internal  and  external 
disturbances,  and  adapts  to  changes  in  the  environment  [9],  Within  a  holonic 
organization,  holons  can  dynamically  create  and  change  hierarchies.  They  can  be  both 
autonomous,  as  well  as  cooperative.  That  is,  holons  can  handle  circumstances  and 
incidents  based  on  their  own  knowledge  and  information  available  without  interference 
from  superiors;  at  the  same  time,  holons  can  still  receive  instructions  or  be  controlled  by 
their  superiors.  This  combined  hierarchical  and  heterarchical  behavior  ensures  superior 
performance  in  complex  C2  operations. 

Yu  et  al  [5-6]  employed  concepts  from  group  technology  and  nested  genetic  algorithms 
to  solve  holonic  coordination  problem  in  a  two-level  structure  (operational  and  tactical 
levels)  involved  in  planning  and  executing  a  single  mission.  The  focus  was  on  asset 
allocation  and  task  scheduling  problems  for  the  expeditionary  strike  groups  (ESG).  Park 
et  al  [7]  modeled  three-level  structures  (viz.,  strategic,  operational,  and  tactical  levels)  for 
MHQ  with  MOC  facing  multiple  simultaneous  or  sequential  missions. 
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Over  the  years,  research  in  reinforcement  learning  (RL)  has  advanced  to  an  extent  that 
realistic  partially  observed  stochastic  control  problems  involving  semi-Markov  decision 
process  (SMDP)  models  are  solvable.  In  addition,  hierarchical  reinforcement  learning 
(HRL)  techniques  have  been  proposed,  including  options  [16],  the  hierarchies  of  abstract 
machines  (HAMs)  [17]  and  maximum  Q-value  (MAXQ)  function  decomposition  [18]. 
The  HRL  techniques  depend  on  the  theory  of  SMDP  to  provide  a  formal  basis.  Sutton  et 
al  [16]  formalized  learning,  planning,  and  representing  knowledge  at  multiple  levels  of 
temporal  abstraction.  Parr  [17]  developed  an  approach  to  hierarchically  structure  MDP 
policies,  termed  HAMs.  The  emphasis  is  on  simplifying  complex  MDPs  by  restricting 
the  class  of  realizable  policies  rather  than  expanding  the  action  choices.  Dietterich  [18] 
developed  another  approach  to  HRL,  termed  the  MAXQ  value  function  decomposition, 
which  relies  on  the  theory  of  SMDPs  in  a  manner  similar  to  options  and  HAMs;  however, 
the  MAXQ  approach  does  not  rely  on  reducing  the  entire  problem  to  a  single  SMDP 
unlike  options  and  HAMs.  Instead,  a  hierarchy  of  SMDPs  is  created  whose  solutions  can 
be  learned  simultaneously.  Rohanimanesh  and  Mahadevan  [19]  investigated  a  model  for 
planning  under  uncertainty  with  temporally  extended  actions,  where  multiple  actions  can 
be  taken  concurrently  at  each  decision  epoch. 

The  present  work  extends  the  work  in  [7]  on  multi-level  coordination  problems  in  MHQ 
with  MOC  by  developing  a  two  level  SMDP  process  to  decide  on  missions  to  be 
executed  and  their  time  sequence  at  the  SLC-OLC  layer  (coordination  of  future  plans), 
while  a  SMDP  approach  to  an  action-goal  attainment  (AG A)  graph  [14]  for  addressing 
the  mission  monitoring/planning  issues  related  to  task  sequencing  and  asset  allocation  at 
the  OLC-TLC  layer  (coordination  of  future  operations  and  current  operations). 

The  contributions  of  this  paper  are  four  fold.  The  three-level  architecture  gives  a  solution 
to  the  C2  coordination  problem  involving  a  higher  level  authority’s  intent  (e.g.,  desired 
effects  at  the  strategic  level),  and  mission  sequencing,  mission  planning,  mission 
monitoring  and  mission  execution  at  the  SLC-OLC-TLC  levels.  The  second  contribution 
is  the  coordination  mechanism  between  the  SLC-OLC  interface  layer  and  the  OLC-TLC 
interface  layer  that  enables  the  two  layers  to  share  the  results  of  individual  SMDP 
problems  being  solved  at  each  layer,  viz.,  DIME  action  selection  and  mission  sequencing 
at  the  SLC-OLC  layer  and  mission  planning  and  mission  monitoring  at  the  OLC-TLC 
layer.  The  third  contribution  of  the  paper  is  the  use  of  distributed  SMDPs  at  the  OLC- 
TLC  layer  to  solve  individual  mission  planning  problems.  The  final  contribution  of  the 
paper  is  that  it  provides  a  framework  on  how  multi-level  organizational  structures  may  be 
employed  for  the  USN’s  complex  and  distributed  coordination  problems  involving  MHQ 
with  MOC. 

Organization  of  the  Paper 

This  paper  is  organized  as  follows.  Section  II  described  our  three  level  C  organizational 
design  model,  and  introduces  a  holonic  reference  architecture  (HRA).  Section  III  shows 
how  two  layer  SMDP  is  applied  at  the  SLC-OLC  and  the  OLC-TLC  layers.  An 
application  example  of  the  approach  to  sequence  and  plan  multiple  missions  is  discussed 
in  section  IV.  Herein,  the  processes  of  centralized  assessment  and  guidance,  distributed 
and  collaborative  planning,  and  decentralized  execution  are  evident  in  that  it  employs 
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centralized  decision  making  at  the  strategic  level  via  a  SMDP,  collaborative  planning  at 
the  operational  level  using  distributed  SMDPs  in  terms  of  specifying  the  alternative  task 
paths  for  missions  and  delineating  mission  phases,  and  negotiation  mechanisms  at  the 
lower  level  to  resolve  scheduling  conflicts.  Finally,  the  paper  concludes  with  a  summary 
of  key  findings  and  future  research  directions  in  section  V. 


II.  STRUCTURE  OF  HOLONIC  C2  REFERENCE  ARCHITECTRUE 

Three-level  Control  Architecture 

Within  the  scope  of  decentralized  C2  requirements,  the  control  architecture  should  be 
distributed,  abstract  and  generalized.  The  control  is  abstract  in  the  sense  that  the 
assumptions  on  the  internal  structure  and  the  behavior  of  other  DMs  should  be  least 
restrictive.  The  generalized  control  requires  that  a  holon  be  cloned  from  certain  basic 
structures.  The  distributed  control  should  also  be  both  reactive  and  self-organizing,  i.e., 
control  is  able  to  respond  to  environmental  disturbances  and  adapt  to  changes  during  the 
mission  execution  process.  We  categorize  the  C2  architectural  concepts  into  the  strategic, 
operational  and  tactical  levels  as  shown  in  Fig.  2. 


Figure  2.  Three  level  holonic  reference  control  architecture. 

Strategic  Level  Control  (SLC)  Architecture 

The  SLC  architecture  provides  a  structure  for  establishing  mission  objectives  and 
guidance  for  future  plans.  At  this  level,  the  process  is  focused  on  national/intemational 
objectives.  It  gives  strategic-level  guidance  to  MHQ  commanders  in  the  form  of 
potential  DIME  actions  for  various  missions,  available  assets  and  mechanisms  for 
resolving  mission  conflicts  as  they  arise  during  subsequent  planning  and  operations.  This 
level  also  decides  on  the  time  sensitivity  of  multiple  missions  and  ensures  that  the 
missions  meet  the  strategic  objectives.  We  model  the  strategic  guidance  using  SMDP. 
The  SMDP  decides  on  the  sequence  of  DIME  actions  for  missions  with  the  national  level 
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constraints  (i.e.,  diplomatic,  information,  military  and  economic).  That  is,  the  SMDP 
decides  which  DIME  actions  should  be  executed  for  various  missions  to  be  planned  at  the 
operational  level  (future  plans). 

Operational  Level  Control  ( OLC )  Architecture 

The  OLC  architecture  provides  facilities  for  mission  decomposition  (i.e.,  generating  the 
task  graph),  deliberate  planning  (future  operations),  command,  and  inter-holon 
coordination/negotiation.  At  this  level,  the  process  is  focused  on  meeting  the  strategic 
guidance  of  the  SLC  by  integrating  and  synchronizing  key  objectives  at  all  levels  of  war. 
It  seeks  to  produce  an  initial  force  structure  that  places  the  subordinate  units  at  the  right 
place  and  at  the  right  time  prior  to  mission  execution.  During  the  current  operations,  it 
monitors  the  real-time  mission  execution  and  its  effects,  and  adjusts  the  initial  plan,  if 
needed,  to  ensure  that  the  mission  is  successfully  completed.  It  also  has  a  collaboration 
mechanism  to  resolve  conflicts  among  multiple  missions  based  on  the  selected  DIME 
actions  at  the  SLC.  We  model  the  operational  objectives  using  goal-action  graphs 
involving  OR  nodes  (that  represent  alternate  paths  to  accomplish  the  end  goals  of  the 
mission),  AND  nodes  (representing  sub-goals  that  are  necessary  to  accomplish  the  end 
goals),  and  Exclusive  OR  (XOR)  nodes  (representing  actions  and/or  goals  that  are  in 
conflict  or  at  odds  with  each  other)  [14]. 

Tactical  Level  Control  (TLC)  Architecture 

The  TLC  architecture  encapsulates  the  functional  holons  that  execute  the  assigned  sub¬ 
missions  or  tasks  (current  operations).  This  tactical  process  involves  local  task 
scheduling,  battlefield  pattern  recognition,  and  negotiation  mechanism.  It  also  provides 
an  interface  to  the  physical  assets.  The  TLC  architecture  can  have  more  than  one  TLC 
instance  (TLC  unit);  the  numbers  of  instances  are  decided  by  deliberate  planning  at  the 
OLC  level.  The  TLC  units  can  be  dynamically  added  or  deleted  according  to  the 
perceived  mission  environment.  A  negotiation  mechanism  is  provided  for  the  TLC  units 
to  resolve  conflicts  among  themselves,  or  to  provide  coordination  as  needed. 

Coupling  the  three-level  architecture,  there  are  two  coordinating  decision  layers  at  the 
SLC-OLC  and  the  OLC-TLC  interfaces.  The  first  decision  layer  (the  SLC-OLC  layer)  is 
used  for  deciding  on  DIME  action  sequences  for  multiple  missions,  and  the  second 
decision  layer  (the  OLC-TLC  layer)  solves  the  mission  planning  problem  under  asset 
constraints.  Task  status  reports  from  subordinate  holons  at  the  TLC  are  sent  up  to  holons 
at  the  OLC.  The  monitoring  and  supervision  of  the  overall  progress  of  the  mission  and 
adjustment  of  tactical  actions  are  promulgated  to  lower  level  holons.  If  missions  are  in 
conflict  at  the  OLC,  the  OLC  requests  the  SLC  for  strategic  guidance  to  resolve  the 
conflict(s)  and  yet  achieve  long-term  strategic  objectives. 


III.  TWO  LAYER  SMDP  HIERARCHY 

The  hierarchical  decision  (learning)  process  spanning  the  two  layers  of  coordination  is 
shown  in  Fig.  3.  At  the  SLC-OLC  layer,  we  model  the  optimization  problem  of  selecting 
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the  DIME  actions  to  achieve  the  desired  effects  as  a  semi-Markov  decision  process 
(SMDP).  At  the  OLC-TLC  layer,  the  problems  of  planning  for  each  mission  are  modeled 
as  distributed  SMDPs.  A  discrete-time  SMDP  is  a  generalization  of  MDP,  in  which  the 
actions  have  a  variable  amount  of  time  to  complete.  The  SMDP  to  solve  the  DIME 
action  selection  problem  at  the  SLC-OLC  layer  is  denoted  by  <  X,U,P(T),R(T)>  . 
Here  X  is  the  state  space,  U  is  the  action  set,  P(T)  is  the  matrix  of  action  and  time- 
dependent  state  transition  probabilities,  and  R(T)  is  the  action  and  state-dependent  reward 
structure  that  is  also  a  function  of  the  (random)  time  between  decision  epochs  T.  The 
time  between  decision  epochs,  T  at  the  SLC-OLC  layer  is  an  output  of  the  mission 
planning  problem;  each  mission  planning  problem  is  solved  using  another  SMDP  model 
at  the  OLC-TLC  layer,  denoted  by  ~  <E,  Y,  II(T),  P(T)>.  Thus,  multiple  SMDPs  are 
running  concurrently  at  the  OLC-TLC  layer.  The  SMDP  at  the  OLC-TLC  layer  models 
each  mission  as  a  goal-attainment  graph  [14]  and  the  time  between  decision  epochs  of 
this  SMDP,  denoted  by  T,  is  an  output  of  TLC  level  as  the  completion  time  of  a  single 
task  (sub-goal)  of  the  goal-attainment  graph. 


DM  0  (Combatant  Commander) 


DM  1  (MHQ/MOC  Commander) 


Ligure  3.  Two  layer  coordination  architecture. 

SMDP  at  the  SLC-OLC  layer  ~  <X,  U,  P  (T),  R  (T)> 

Given  a  MDP  and  a  set  of  concurrent  temporally  extended  actions  defined  on  it,  the 
decision  process  that  selects  only  among  multi-actions  and  executes  each  one  until  its 
termination  according  to  a  given  termination  condition  forms  a  SMDP.  The  SMDP  at  the 
SLC-OLC  layer  is  formulated  by  learning  hierarchically  the  concurrent  action  plans  over 
temporally  extended  actions,  which  are  the  variable  amount  of  times  to  complete 
individual  missions  at  the  OLC-TLC  layer.  There  are  hierarchical  concurrent  actions 
which  are  both  the  courses  of  action  of  individual  missions  at  the  OLC-TLC  layer  and 
goal-attained  actions  of  parallel  missions  at  the  TLC.  Here,  the  goal-attained  action  is  the 
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completion  time  of  a  goal-attained  task  and  is  captured  in  deciding  the  courses  of  action 
at  the  OLC-TLC  layer.  In  the  SMDP  of  <X,  U,  P(T),  R(T)>,  the  time  between  decision 
epochs,  T  is  defined  as  the  duration  of  time  that  any  of  the  missions  corresponding  to  a 
state  in  the  X  at  the  SLC-OLC  layer  being  completed  and  directly  affected  by  the  courses 
of  actions  at  the  OLC-TLC  layer. 

The  SMPD  at  the  SLC-OLC  layer  is  formalized  as  follows  (mathematical  details  are 
included  in  the  Appendix). 

•  The  state  here  represents  the  mission  space.  We  consider  a  scenario  where  the  SLC- 
OLC  layer  needs  to  dynamically  select  a  state-dependent  policy  that  decides  on  the 
DIME  action  sequences  for  multiple  missions  that  are  to  be  planned  at  the  OLC-TLC 
layer.  The  combination  of  military  missions,  such  as  peacekeeping,  HA/DR 
(Humanitarian  Assistance  and  Disaster  Relief),  stability  operations,  and  major  combat 
operations,  constitute  the  states  of  SMDP  (see  Table  1). 

•  The  actions  represent  feasible  paths  (courses  of  action)  in  a  directed  acyclic  network 
of  DIME  action  sequences  from  a  source  node  to  a  destination  node  as  illustrated  for 
a  hypothetical  scenario  in  Fig.  4.  Note  that  each  mission  has  a  different  network 
graph  consisting  of  feasible  action  sets. 

•  The  state  transition  probability  is  defined  as  the  probability  of  being  in  the  next  state 
at  a  decision  epoch  that  is  T  time  steps  ahead,  given  the  current  state,  and  an  action. 

•  The  reward  function  represents  the  expected  national  level  resource  usage  costs, 
given  current  state,  and  an  action. 

This  is  how  the  process  works.  The  SLC-OLC  layer  is  provided  the  results  of  previous 
courses  of  action  by  the  OLC-TLC  layer  (e.g.,  completion  times  for  various  course  of 
action).  Evidently,  the  completion  times  of  missions  constitute  the  decision  epochs  of  the 
SMDP  at  the  SLC-OLC  layer.  The  results  from  OLC-TLC  layer  (an  output  of  local 
SMDPs  at  the  OLC-TLC  layer)  affect  the  state  transition  probabilities  of  the  SMDP 
process  at  the  SLC-OLC  layer.  Formally,  we  define  the  holding  time  distribution 
function  F(T(k )  |  x(/<),  Uj(k))  at  time  k  as  the  probability  that  any  of  the  missions 
corresponding  to  state  x(k)  at  the  SLC-OLC  layer  finishes  at  time  T(k),  i.e., 

F(T{k)  |  x(k),Uj(k))  =  l-n(l-®,.(v(A:  +  l),r(A:))zi.(A:)),  (1) 

i=l 

where  zt(k )  denotes  the  status  of  mission  i  at  time  k  in  Table  1.  Here,  z,  =1  denotes  the 
presence  of  a  mission  and  0  its  absence  (see  the  details  at  the  Appendix).  The  right  hand 
side  of  eq.  (1)  denotes  the  probability  that  at  least  one  of  the  missions  terminates  in  state 
{x(£+l),  T(k)\  according  to  its  termination  condition  (see  the  details  at  the  Appendix). 
The  SMDP  at  the  SLC-OLC  layer  selects  the  DIME  actions  to  minimize  the  total 
expected  national  level  resource  usage  costs  given  the  current  mission  state.  The  SLC- 
OLC  layer  affects  the  reward  functions  of  local  SMDPs  at  the  OLC-TLC  layer  by 
providing  DIME  policy-dependent  mission  weights  to  each  of  the  missions.  For  a  given 
DIME  policy  n  (i.e.,  a  path  in  the  DIME  action  network),  mission  weight  is  defined  as  the 
mean  of  individual  action  weights  in  each  phase  {b*  }^=l  at  the  SLC-OLC  layer: 
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b”  =  (1  /  ,  where  M  is  the  number  of  DIME  action  phases.  For  example,  for  a 

phase  m,  if  one  selects  b*  =  lfor  a  military  action,  0.9  for  a  diplomatic  action,  and  0.8  for 

information  or  economic  action,  mission  weight  will  have  a  larger  value  for  a  policy 
having  more  military  actions  than  other  actions. 


X 

Peacekeeping 

HA/DR 

Stability  Ops. 

Major  Combat  Ops. 

Xi 

1 

1 

1 

1 

x2 

1 

1 

1 

0 

x3 

1 

1 

0 

1 

x4 

1 

1 

0 

0 

x5 
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0 

1 

1 

x6 

1 

0 

1 

0 

Xj 

1 

0 

0 

1 

x 8 

1 

0 

0 

0 

Table  1.  State  space  denoting  combinations  of  military  missions  (operations). 
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Figure  4.  A  DIME  action  network:  the  numbers  between  actions  denote  the  required 
resources  for  action  /  in  phase  m  of  the  mission. 


(. Distributed )  SMDP  at  the  OLC-TLC  layer  ~  <S,  Y,  F1(T),  P(T)> 

We  convert  a  mission,  represented  as  an  acyclic  action-goal  attainment  (AGA)  graph  [14], 
to  a  SMDP  ~<H,  Y,  F1(T),  P(T)>.  Here  Euclid  letters  <E,  Y,  F1(T),  P(T)>  denote  the 
SMDP  attributes  at  the  OLC-TLC  layer  and  the  time  between  decision  epochs,  T  at  the 
OLC-TLC  layer  is  a  stochastic  output  of  the  tasks  (sub-goals  being  executed)  at  the 
tactical  level.  In  our  previous  work  [14],  we  formulated  and  solved  the  problem  of 
planning  actions  to  achieve  desired  end  goals  (states)  subject  to  resource  and  time 
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constraints  by  employing  a  Markov  decision  process  (MDP)-based  method.  It  addresses 
the  problem  of  optimally  selecting  a  sequence  of  actions  to  transform  the  mission 
environment  from  an  initial  state  to  a  desired  state.  It  begins  with  a  method  to  explicitly 
map  an  AGA  graph  to  an  MDP  graph,  and  develops  a  dynamic  programming  (DP) 
recursion  to  solve  small-sized  MDP  problems,  and  limited  search  AND/OR  graph  search 
techniques  to  solve  large-scale  MDP  problems  [14]. 


Figure  5.  An  AGA  graph  for  a  HA/DR  operation. 

In  this  paper,  we  transform  the  AGA  graph  (representing  operational  objectives  or 
commander’s  intent  in  the  form  of  DIME  actions  constituting  a  mission)  into  a  SMDP. 
The  purpose  of  SMDP  is  to  decide  on  alternative  options  to  complete  missions 
(developing  task  graphs),  as  well  as  sequencing  tasks  (see  Fig.  5).  The  distribution  of 
mission  completion  time,  an  output  of  OLC-TLC  SMDP,  is  shared  with  the  SLC-OLC 
layer  to  be  used  to  determine  the  time  between  decision  epochs  T  at  the  SLC-OLC  layer, 
and  the  state  transition  probabilities.  The  AGA  graph  consists  of  OR  nodes  (that 
represent  alternative  paths  to  accomplish  the  mission  goals),  AND  nodes  (representing 
sub  goals  that  are  necessary  to  accomplish  the  mission  goals),  and  XOR  nodes 
(representing  actions  and/or  goals  that  are  in  conflict  with  or  at  odds  with  each  other). 
These  AGA  graphs  are  transformed  into  SMDPs  and  solved  via  a  DP  recursion  or  its 
approximate  variants. 

The  distributed  SMPD  for  each  mission  at  the  OLC-TLC  layer  is  as  follows 
(mathematical  details  are  included  in  the  Appendix). 

•  The  state  here  represents  the  combined  status  of  sub  goals  in  the  AGA  graph:  each 
sub  goal  is  accomplished  or  not  (see  Table  2). 

•  An  action  represents  an  option  in  a  given  state.  Following  the  same  line  of  reasoning 
in  [14],  a  set  of  control  availability  conditions  determines  whether  a  combination  of 
actions  is  allowed  (see  Table  3). 
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•  The  state  transition  probabilities  is  defined  as  the  probability  of  being  in  the  next  state 
at  a  decision  epoch  T  time  steps  ahead  (i.e.,  holding  time  at  the  TLC),  given  current 
state,  and  an  action. 

•  The  rewards  are  related  to  task  difficulty  and  task  accuracy  of  alternative  paths  in  the 
AGA  graph. 
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Table  2.  The  state  space  representing  the  combined  status  of  sub  goals  in  the  AGA. 
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Table  3.  Action  set. 
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This  is  how  the  process  works.  The  OLC-TLC  layer  is  provided  the  results  of  task 
completion  times  by  the  DMs  at  the  TLC.  Evidently,  the  task  completion  times  at  the 
TLC  constitute  the  decision  epochs  of  the  SMDP  at  the  OLC-TLC  layer.  The  task-level 
results  from  TLC  (an  output  of  task  scheduling  algorithm)  affect  the  state  transition 
probabilities  of  the  SMDP  process  at  the  OLC-TLC  layer.  The  holding  time  distribution 
function  0(T(x)|  x(k),  uj(k)),  probability  of  mission  being  completed  within  T(x)  time 
units  of  decision  epoch  k  at  the  OLC-TLC  layer  by  any  of  the  paths  in  the  AGA  graph,  is 
given  by: 

N  p 

F  (T  (k)  |  x(k), Uj (k))  =  (x(k  + 1), T  (*)),  (2) 

p= i 

where  N/;  is  the  number  of  different  paths  through  which  the  mission  can  be  completed, 
and  the  termination  condition  is  defined  in  terms  of  the  make  span  of  tasks  on  a  path  at 
the  tactical  level.  The  right  hand  side  of  eq.  (2)  denotes  the  probability  that  at  least  one  of 
the  paths  terminates  in  state  {x(k+l),T{k)}  according  to  its  termination  condition  tap  (For 
details,  the  reader  is  refer  to  the  Appendix).  The  goal  of  SMDP  at  the  OLC-TLC  layer  is 
to  find  a  policy  at  each  state  (best  state-dependent  action  path  in  the  AGA  graph)  using 
the  task  completion  conditions  provided  by  the  DMs  at  the  TLC,  and  mission  weight 
provided  by  the  SLC-OLC  layer. 

The  overall  coordination  process  is  formalized  as  shown  in  Table  4. 


1.  Initialize  SMDP  at  the  SLC-TLC  layer  ~  <  X(0),  t/(0),  P(T(0)),  R(T(0))  >  and 
SMDPs  at  the  OLC-TLC  layer  ~  {<S  (0) ,  Y  (0) ,  n(T  (0) ),  P(T  (0)  )>,•}  f=1 ,  where  N  is 
the  number  of  missions. 

2.  Decide  on  the  DIME  action  policy  by  solving  the  SMDP  problem  at  the  SLC-OLC 
layer  with  current  information  on  mission  completion  times  form  the  OLC-TLC  layer. 
Transmit  mission  weights,  bn  to  the  OLC-TLC  layer. 

3.  The  take-asset  assignment  results  with  task  completion  conditions  are  provided  to  the 
SMDPs  at  the  OLC-TLC  layer  by  the  DMs  at  theTLC. 

4.  Using  the  information  from  steps  2  and  3,  each  OLC-TLC  layer  mission  planner 
decides  on  state-dependent  action  path  in  the  AGA  mission  graph  and  the  mission 
completion  time.  Transmit  the  mission  completion  time  to  the  SLC-OLC  layer. 

5.  Repeat  steps  2  to  4  until  the  policies  at  the  each  layer  has  converge. 


Table  4.  The  overall  coordination  process 
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IV.  OPERATIONAL  MODEL  FOR  HIERARCHICAL  HOLONIC  PLANNING 

In  our  previous  work  [7],  the  three-level  (SLC-OLC-TLC)  model  for  the  C  holonic 
reference  architecture  (HRA)  for  planning  and  executing  multiple  missions  was 
considered.  The  model  included  the  mission  and  its  decomposition  into  a  task  graph, 
asset  allocation,  and  task  scheduling.  Those  elements  of  the  model  are  also  used  in  this 
work,  with  a  focus  on  mission  planning  issues  involving  DIME  actions.  We  consider  the 
following  example  for  illustrative  purposes. 

Missions:  MHQ  with  MOC  is  assigned  to  complete  two  military  missions,  which  occur 
in  geographically  separated  areas,  e.g.,  mission  1:  capturing  a  seaport  to  allow  an 
introduction  of  follow-on  forces  (major  combat  operation),  mission  2:  rescue  activity 
after  a  hurricane  in  the  homeland  (HA/DR).  Fig.  6  shows  the  geographical  situation  in 
this  area  [10].  We  assumed  that  a  single  mission  has  multiple  alternative  paths  of 
completing  it  [7],  The  mission  state  3  is  the  initial  state  where  MHQ  with  MOC  is  tasked 
to  execute  a  major  military  operation  and  an  HA/DR. 
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Figure  7.  Task  resource  requirements  (left)  and  asset  resource  capabilities  (right). 
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The  resource  requirements  for  each  task  and  the  resource  capabilities  of  each  asset  are 
presented  in  Fig.  7.  The  resource  vector  consists  of  8  attributes,  which  are  AAW  (Anti- 
Air  Warfare),  ASUW  (Anti-Surface  Warfare),  ASW  (Anti-Submarine  Warfare),  GASLT 
(Ground  Assault),  FIRE  (Artillery),  ARM  (Armor),  MINE  (Mine  Clearing)  and  DES 
(Designation).  We  note  that  each  task  needs  to  be  processed  by  a  combination  of  assets. 


DIME  action  sequencing  (Future  Plans):  The  SLC-OLC  layer  manages  multiple 
missions;  it  provides  guidance  for  future  plans  by  specifying  the  sequence  of  DIME 
actions  to  be  planned  and  executed  using  SMDP.  From  the  optimal  SMDP  action  set,  the 
feasible  action-paths  for  missions  1  and  2  are  computed  by  assuming  the  mission 
difficulty  factor  in  eq.  (4)  in  the  Appendix  to  be  a,  e  [0.7  0.9]  for  HA/DR,  [0.9  1.1]  for 
the  stability  operations,  and  [1.1  1.3]  for  the  major  combat  operations. 

The  feasible  actions-paths  for  missions  1  and  2  are  shown  in  Fig.  8.  There  are  48  action- 
paths  for  the  mission  1  with  upper  bound  (resource  requirements)  q\=  33.3,  and  18 
action-paths  for  the  mission  2  with  q2=  22.88  (see  eq.  (5)  in  the  Appendix). 
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Figure  8.  The  feasible  DIME  action-paths  as  computed  at  the  SLC-OLC  layer. 

The  state  transition  probabilities  associated  with  the  SMDP  at  the  SLC-OLC  layer 
P(x(k+ 1)|  T(k),  x(k),  uj(k ))  is  obtained  by  assuming  that  it  is  related  to  the  ratio  of 
resources  allocated,  ghi  for  a  mission  i  in  a  state  x/,  to  the  resources  required,  qt  in  eq.  (7) 
(see  Appendix): 

n  ~z« (w -%(*+!)> 

'  9, 

P(x(k  + 1)  I  T{k), x(k), ^ -  (3) 

1  +  -Er  I  n  7%(*X1-U*+1)) 

1  h= 1  i= 1  Hi 

h(k)*h(k+l)  z-  (k)( \-zt  (£+!))=! 
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The  numerator  in  eq.  (3)  is  a  function  of  the  resource  allocation  ratio  and  whether  the 
mission  state  for  a  mission  i  in  a  state  x/,  has  changed  from  1  to  0,  i.e.,  z  in(k)(\-z  iri(k+\))  is 
1  only  if  Zhi  (k)=  1  and  z/„  (k+ 1)=  0.  For  example,  if  current  state  x\=  z=  [111  1],  and  the 
next  state  is  X2=  z=  [1  1  1  0],  the  state  transition  vector,  z/„(/c)(l-z/,;(/c+l))  is  [0  0  0  1]. 
The  holding  time  distribution  function  F(T(k)\  x(k),  Uj{k ))  is  calculated  by  eq.  (1)  once  it 
is  provided  by  OLC-TLC  layer  as  the  SMDP  solution.  The  reward  (cost)  for  a  feasible 
action-path  of  a  mission  is  calculated  as  g;=  £(/m)e Acum,  cum  e  C,,  in  eq.  (4)  and  the 

reward  (cost)  for  each  state-action  pair  is  computed  as:  R(x(k),  Uj(k ))  =  £  f=i  R(-i(k), 
itj(k))=  X,=,  gi(k)  Zi(k )  (see  Appendix). 


Mission  Decomposition  (Future  Operations):  The  OLC-TLC  layer  provides  plans  for 
future  operations;  it  devises  plans  for  missions  that  include  the  mission  decomposition 
and  exploring  alternative  options  (paths)  in  the  AGA  graph,  such  as  those  in  Fig.  9,  to 
select  the  best  option. 


t2.8 


Figure  9.  Alternative  options  on  the  AGA  graph  for  the  two  missions. 

In  addition  to  this,  we  use  the  asset  allocation  plan  and  the  mission  scenario  from  our 
previous  work  [5,7].  A  set  of  tasks  with  specified  resource  requirements,  locations,  and 
precedence  relations  need  to  be  processed  by  the  organization.  The  tasks  are  assigned  to 
DMs  based  on  the  fit  between  the  resource  requirements  of  tasks  and  the  resource 
capabilities  of  DMs.  The  assigned  DMs  select  and  send  their  assets  to  the  locations 
where  tasks  appear  in  order  to  execute  them  with  minimum  lead  time  and  maximum 
accuracy.  The  probability  of  termination  condition  mp  for  each  alternative  path  is 
calculated  as  the  task  success  probability,  g/  ( k )/  qi  ( k ),  which  is  the  ratio  of  the  resource 
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capabilities  of  assets,  g/  and  the  resource  requirements  of  tasks,  qi,  based  on  the  asset 
allocation  and  task  execution  activities  at  the  tactical  level  (see  Fig.  10  in  the  Appendix) 
in  [7]:  a tp=  (l/«y/=i)X”ii  [gi(k)/  qi(k)]ti(k),  where  n,  is  the  number  of  tasks  and  ti  e  {0,  1} 

denotes  the  status  of  task  /  representing  ti  =1  for  the  presence  of  a  task  and  0  for  its 
absence. 


Time  Unit 


Missions 

Task  Units 

Assets 

0 

1 

2 

3 

4 

5 

6 

7 

8 

Ml 

TUI 

1 

ti  .3 

ti  .7 

4 

ti  .7 

TU2 

5 

ti  .4 

13 

ti  .4 

TU3 

2 

ti  .2 

8 

T1.2 

10 

T1.2 

12 

T1.2 

4 

ti  .5 

6 

ti  .5 

9 

ti  .5 

12 

ti  .6 

M2 

TUI 

1 

t2.5 

11 

t2.5 

14 

t2.5 

TU2 

3 

t2.4 

5 

t2.4 

t2.6 

13 

t2.5 

t2.6 

15 

t2.4 

TU3 

2 

t2.2 

8 

t2.2 

10 

t2.2 

TU4 

4 

t2.7 

6 

t2.3 

t2.7 

7 

t2.7 

Feasible  Schedule  | _ |  Coordinating  Schedule 

Figure  10.  The  operational  scenarios  for  two  missions. 


The  task  paths  and  their  success  rates  for  mission  2  in  state  1  are  shown  in  Table  5.  We 
assume  that  the  terminal  condition  tup  is  uniformly  distributed  over  the  holding  time  T (k). 
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Table  5.  Task  success  rate  for  each  alternative  path  of  mission  2. 
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The  holding  time  distribution  function  0(T(A)  |  x(k),  Uj(k ))  for  an  action  u/(k)  in  state  x(k) 
is  calculated  using  the  distribution  of  termination  condition  ujp  in  eq.  (2).  The  state 
transition  probability  given  current  state  x(k),  an  action  Uj(k),  n(x(&+l)|  T(A),  x(k),  Uj(k )) 
is  obtained  by  the  task  success  probability.  Using  eq.  (10)  in  the  Appendix,  the  state 
transition  probabilities  at  the  OLC-TLC  layer,  {II(x(A:+l),  T(x)|  x(k),  uj(k))}  are  obtained 
as  shown  in  Table  6. 
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Table  6.  The  state  transition  matrix  for  mission  2  (u  \  action). 


The  mission  completion  reward,  P(T(A)|  x(k),  uj(k ))  for  SMDP  at  the  OLC-TLC  layer  is 
calculated  by  eq.  (11)  in  the  Appendix.  Furthermore,  the  reward  P(x(/c),  iij(k))  of  an 
action  is  obtained  by  P(x(&),  Uj{k))=  qi(k)/  ma x(qi(k))  estimating  mission  difficulty  in 
terms  of  resource  requirements  of  tasks  for  a  task,  qi.  The  additional  reward  over  the 
holding  time  T(A)  is  obtained  in  terms  of  resource-redundancy  for  tasks,  i.e.,  r(J(k))= 
rJT(k))/  max(r,/(T(£)),  where  rc/(T(k))  is  the  resource-redundancy  for  tasks  during  the 
holding  time  T (k).  The  reward  structure  of  SMDP  at  the  OLC-TLC  layer  is  shown  in 
Table  7.  Here,  zero  entries  denote  that  there  are  no  transitions  between  those  states. 
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Table  7.  The  reward  structure  of  mission  2  at  the  OLC-TLC  layer. 
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The  SMPD  is  solved  via  a  DP  recursion  (value  iteration)  [13].  The  optimal  policies  of 
each  state  are  shown  in  Table  8. 
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Table  8.  The  optimal  policy  for  mission  2  at  the  OLC-TLC  layer. 


Deliberate  Planning  (Current  Operations):  The  OLC-TLC  layer  also  provides  current 
operations  plans  by  selecting  best  options  for  a  mission  based  on  optimal  policy  (actions). 
For  example,  the  DM  decides  on  patrol  action  for  securing  the  area  (A2.5),  thereby 
maximizing  the  operational  level  rewards  while  the  mission  2  is  being  completed  using 
path  8  on  the  AGA  graph  (see  Table  5),  i.e.,  t=  x  17  =  [1  1  1  1  101  1]  with  the  holding 
time  T(£)=  14  and  the  termination  condition  <y;=  0.05  (see  Fig.  1 1). 


Figure  11.  Optimal  actions  at  a  state  16  for  mission  2. 

Now  the  DM  at  the  SLC-OLC  layer  is  provided  the  operational  information  from  each 
DM  at  the  OLC-TLC  layer  to  solve  the  SMDP  problem  at  the  SLC-OLC  layer,  i.e.,  the 
termination  condition  mp  for  alternative  paths  impacts  the  termination  condition  co,  and 
the  optimal  value  function  at  the  OLC-TLC  layer.  These,  in  turn,  are  used  as  the  rewards 
over  the  holding  time  T(k )  at  the  SLC-OLC  layer.  As  defined  in  section  IV,  the  terminal 
condition  coj  for  a  mission  i  is  the  terminal  condition  of  alternative  path  having  the 
maximum  completion  time  (make  span)  for  a  mission  at  the  OLC-TLC  layer.  The 
terminal  conditions  and  the  maximum  completion  time  for  missions,  and  their  path  are 
shown  in  Table  9. 
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Table  9.  The  terminal  conditions  a>i  for  missions  1  and  2. 
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The  holding  time  distributions  F( T(k)  \  x(k),  Uj(k ))  are  obtained  by  eq.  (1);  using  this  in 
eqs.  (3)  and  (6)  in  the  Appendix,  we  obtain  the  overall  state  transition  probability, 
P(x(k+l)\T(k),  x(k),  Uj(k ))  at  the  SLC-OCL  layer.  There  are  n,  matrices  of  dimension  m, 
x  nh,  where  ry  is  57,722  with  18  HADR  actions,  61  stability  operations  and  48  major 
combat  operations. 


The  reward  structures,  R(T(k)\  x(k),  Uj(k ))  is  obtained  by  eq.  (8)  in  the  Appendix,  i.e.  the 
sum  of  the  usage  cost  at  the  SLC-OCL  layer,  R(x(k),  Uj(k))  and  the  expected  total  reward 
q\x),  of  an  alternative  option  to  complete  any  mission  at  the  OLC-TLC  layer.  Instead  of 
summing  usage  cost  and  expected  reward  directly,  we  normalize  them  in  terms  of 
desiring  values:  R(x(k),  Uj(k))=  exp[-/?(x(/c),  Uj(k))/max(R(x(k),  iij(k))]  and  q\x)  = 
exp[mean(<;'T(x))/  max  (^(x))].  The  result  of  the  SMDP  at  the  SLC-OLC  layer  using  the 
state  transition  probabilities  and  reward  structure  is  shown  in  Fig.  12  in  terms  of  future 
plans  for  MHQ  /  MOC.  The  two  layers  may  iterate  until  the  decisions  at  the  two  layers 
are  congruent. 
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Figure  12.  The  information  flow  between  the  SMDP  processes  at  the  interface  layers. 


V.  CONCLUSIONS  AND  FUTURE  WORK 

In  this  paper,  we  developed  the  rudiments  of  a  C2  holonic  reference  architecture  that  is 
applicable  to  Navy  MHQs  with  MOC  for  assessing,  planning  and  executing  multiple 
missions  and  tasks  across  a  range  of  military  operations.  We  model  the  coordination 
issues  inherent  in  the  MHQ  with  MOC  via  a  three-level  holonic  reference  architecture 
that  links  tactical  and  strategic  levels  of  decision  making.  We  sought  to  demonstrate  that 
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2 

the  C  coordination  issues  at  the  three  levels,  viz.,  strategic,  operational  and  tactical  levels, 
associated  with  DIME  actions  (future  plans),  and  mission  planning  (future  operations  and 
current  operations)  can  be  modeled  using  SMDP  formalisms  within  the  proposed  holonic 
architecture.  The  two  layers  share  the  results  of  individual  SMDP  problems  at  each  level 
while  the  distributed  SMDPs  at  the  OLC-TLC  layer  solve  individual  mission  planning 
problems.  The  approach  is  illustrated  using  a  representative  scenario  involving  multiple 
missions. 


APPENDIX 

SMDP  formulation  at  the  SLC-OLC  layer  ~  <X,  U,  P  (T),  R  (T)> 

State  space,  X :  The  mission  environment  is  assumed  to  have  «/,  states.  Each  state,  x(k), 
defined  at  the  beginning  of  a  decision  epoch  k,  denotes  a  combination  of  missions;  it  is 
assumed  to  belong  to  a  set,  X=  {x/, }  f  .  If  there  are  N  missions,  and  if  we  let  z,  e  {0, 
1  (denote  the  status  of  mission  i,  where  z,  =1  denotes  the  presence  of  a  mission  and  0 
implies  its  absence,  the  state  Xh  is  represented  by  an  TV-dimensional  binary  vector  z  =  [z  \ 
Z2. . .  zN  ]  and  the  number  of  states,  «/,  can  be  at  most  2  V  (in  practice  much  less,  see  Table 
1).  Table  1  shows  the  different  states  (one  for  each  row)  of  missions,  along  with  the 
operational  attributes  (presence,  absence)  characterizing  them.  For  example,  state  *3  =  [1 
10  1]  corresponds  to  z\=  1,  z2=  1,  z3=  0  and  z4=  1.  That  is,  this  state  is  characterized  by 
{Peacekeeping,  HA/DR,  Major  combat  operations}. 

Action  set,  U:  An  action  ufk),  defined  at  the  beginning  of  a  decision  epoch  k,  in  state  x  is 
assumed  to  belong  to  the  set  £4(x)=  { Uj(x) }  f ,  where  >y  is  the  number  of  feasible  action 

sequences.  The  feasibility  of  action  sequences  is  determined  by  solving  a  shortest-path 
problem  and  a  longest-path  problem  using  Dijkstra’s  algorithm  [11]  based  on  the  DIME 
resource  requirements  of  a  mission  in  the  network.  We  employed  a  normalized  CAMEO 
scale  [12]  to  obtain  link  costs  in  this  network. 

Let  Ciim  denote  the  required  resources  for  action  l  in  phase  m  of  the  network  for  mission  i. 
This  can  be  represented  as  a  DIME  resource  usage  cost  matrix  C,-,  where  C,  is  an  L  by  M 
matrix: 


C,  =  a.CUF,  = 


'ill 


CiL\ 


-HM 


CiLM 


^ ilm 


>0. 


(4) 


Here,  at  is  the  mission  difficulty  factor  and  L  is  the  maximum  number  of  actions  (rows  of 
Ci  matrix)  in  a  phase  l  and  Mis  the  number  of  phases  (columns  of  C,  matrix). 


In  addition,  C  is  a  (mission-independent)  DIME  resource  usage  cost  matrix  including  all 
possible  DIME  actions  in  an  area,  and  F,  is  a  matrix  of  feasible  action  sets  for  each 
mission: 
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nl 

/=1? 


film 


0 


if  an  action  l  can  be  used  for  a  mission  i  in  phase  m 
otherwise. 


In  Eq.  (4),  •  denotes  Hadamard  (Schur)  product,  i.e.,  =  a-,  Cjk  . fyk ■  Let  p(t)  and  y(t) 


denote  the  shortest  and  longest  distances  (minimum  and  maximum  resource  costs)  of  any 
path  from  a  source  node  s  to  a  destination  node  t,  computed  using,  for  example,  the 
Dijkstra’s  algorithm1  [11].  We  assume  that  these  costs  specify  an  upper  bound  on  the 
path  costs  that  strategic  level  decision  maker  is  willing  to  commit  to  a  mission.  Thus,  by 
letting  qt  be  an  upper  bound  on  the  resource  requirement  for  a  mission  i,  we  assume  the 
following  generalized  mean  with  exponent  a  for  q,\ 


9,  =f(p(0, '/('»  =  m,  (p  (<),  r(0)  = 


(p(0)‘+(y(0) 


a  \ 


1/a 


(5) 


The  value  of  a  allows  us  to  model  a  variety  of  behaviors  at  the  strategic  level.  When  a—> 
-oo,  q,  is  the  minimum  resource  cost  (shortest  distance);  when  a=  - 1,  q,  is  the  harmonic 
mean  of  the  minimum  and  maximum  resource  cost  2p(t)y(t)l  \p(t)+y(t)\\  when  a=  1,  qt  is 
the  arithmetic  mean  (average)  of  the  minimum  and  maximum  resource  cost  [pii)+y{{)\l  2; 
when  a=  2,  qt  is  the  root  mean  square  value  of  the  minimum  and  maximum  resource 
cost;  and  when  a—>  oo,  qt  is  the  maximum  resource  cost  (longest  distance).  In  the 
simulations  below,  we  assume  a=  1.  Given  an  upper  bound  on  resource  usage  q, ,  all 

paths  with  length  <  qt  comprise  the  feasible  action  paths  At  for  a  mission  i.  Letting  Lt  = 

■  ■  ^ 

\Ai\,  the  cardinality  of  action  set  \Uk(xh)\=  \Ut  (z)|=  _[  |._b_  _lLi . 


State  transition  probabilities,  {P(x(k+\)\T(k),  x(k),  ii,{k))}.  given  current  state  x(k),  and 
an  action  Uj(k),  the  probability  of  {x(k+ 1),  T(k)}  being  the  next  state  at  decision  epoch 
T(k)  time  steps  ahead  (i.e.,  holding  time)  is  denoted  by  P(x(k+l)\T(k),  x(k),  Uj{k)). 
Evidently, 

P(x(k  + 1),  T ( k )  |  x(k),  Uj  ( k ))  =  P(x(k  + 1)  |  T ( k ),  x(k),  Uj  ( k))F(T ( k )  |  x(k),  Uj  ( k ))  (6) 

where  P(x(k+l)  \  T(k),  x(k),  uj(k ))  denotes  the  state  transition  probability  given  current 
state  x(k),  an  action  Uj(k),  and  the  holding  time  T(k)  prior  to  transition  to  state  x(k+  \ )  at 
the  SLC-OLC  layer.  F(T(k )  |  x(k),  Uj(k ))  is  the  holding  time  distribution  function  that  the 
next  decision  epoch  occurs  within  T(k)  time  units  of  the  current  decision  epoch  k ,  given 
current  state  x(k),  and  action  Uj(k)  (see  eq.  (1)). 


1  Since  the  graph  is  acyclic,  both  the  shortest  and  longest  path  lengths  can  be  computed  using  the  Dijkstra’s 
algorithm. 
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We  define  P(x(k+  \ )  |  T(k),  x(k),  Uj(k ))  as  the  probability  that  action  Uj(k)  is  initiated  in 
state  x(k)  at  decision  epoch  k,  without  terminating  in  state  x(k+ 1 )  until  T(k)  units.  Thus, 

P(x(k  + 1)  |  T(k),  x(k),  Uj  (k))  =  [P(x '  |  T(k)-T0,  x{k),  Uj  (k)) 

x'nX  (7) 

n(l-ni;(x^(A:)-r0)z!.(A:))P(x(A:  +  l)|r0,x>y(^))] 

where  To  is  the  single  time  unit  and  C0j(x',  T(k)  -  To)  is  the  termination  probability  of 
mission  i.  The  first  term  denotes  the  probability  of  executing  missions  T(k)  -  To  time 
units  and  reaching  an  intermediate  state  x';  the  second  term  denotes  the  probability  that 
none  of  the  missions  terminates  in  state  {x',  T(k)  -  To}  according  to  its  termination 
condition  ;  and  the  last  term  denotes  the  probability  that  at  least  one  of  the  missions  is 
completed  in  a  single  time  step  To  so  that  the  state  transitions  to  x(k+l).  We  obtain  the 
distribution  of  the  termination  condition  cot  for  a  mission  i  in  terms  of  the  maximum 
completion  time  (make  span)  of  alternative  options  (paths)  for  a  mission  at  the  OLC-TLC 
layer. 

Reward  (Cost)  structure,  {R(T(k)\  x(k),  «/(£))} :  The  reward  function  \R(T(k)\  x(k), 
Uj(k))}  denotes  the  expected  reward  being  the  next  state  within  time  T(k)  time  units  of  the 
current  decision  epoch  k ,  given  current  state  x(k) ,  and  an  action  u/  ( k ) .  The  reward 

(cost)  structure  R(x(k),Uj(k ))  is  defined  as  the  sum  of  the  usage  cost  at  the  SLC-OCL 

layer,  and  the  expected  total  reward  q\x),  of  an  alternative  option  to  complete  any 
mission  at  the  OLC-TLC  layer.  The  reward  at  the  OCL-TLC  layer  is  provided  as  the 
expected  cumulative  reward  obtained  by  solving  the  SMDP  problem  at  the  OLC-TLC 
layer;  thus  the  reward  structure  for  taking  an  action  Uj(k)  in  state  x(k)  during  T(k)  is 

defined  as: 


R(T {k)  |  x{k),  Uj  (k))  =  R(x(k),  Uj  ( k ))  +  V*  (x)  =  £  R(zt  (k),  u}  (k))  +  (x).  (8) 

i=i 

The  first  term  in  eq.  (8),  R(x(k),  Uj(k)),  refers  to  the  sum  of  rewards  received  by  SMDP  at 
the  SLC-OLC  layer  for  performing  action  iij(k)  in  local  state  x(k).  The  second  term,  q'T(x), 
completes  the  sum  by  accounting  for  rewards  earned  for  completing  a  mission  at  the 
OLC-TLC  layer. 

Discount  rate,  /?:  the  relative  weight  of  future  rewards,  0<  J3<1. 

The  objective  of  SMDP  model  at  the  SLC-OLC  layer  is  to  determine  an  optimal  policy, 
i.e.,  a  mapping  from  states  to  actions,  such  that  the  value  function  (expected  total  cost)  is 
minimized.  The  value  function  of  an  initial  state  x  =  x(0),  for  policy  /ris  denoted  as: 


V*(x)  =  E^pkR{T{k)  |  x(k),Uj(k))  +  fiKR(T(K)  |  x(/Q)] ,  (9) 

k=0 

where  K  is  the  number  of  decision  epochs  (planning  horizon). 
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SMDP  formulation  at  the  OLC-TLC  layer  ~  <E,  Y,  II(T),  P(T)> 

State  space,  S:  The  goal  model  is  best  visualized  as  a  network  of  action  alternatives  and 
their  respective  outcomes  via  a  directed  acyclic  graph,  termed  the  AGA  graph,  T(T,  A),  a 

task  set  T  =  {tf  f  u {/),•}  f ,  ti  e  {0,  1 },  o,  e  {OR,  AND,  XOR},  and  an  action  set  A  = 
\uj }  f ,  aj  e  {0,  1 },  where  ti  is  a  task  node,  i)j  is  a  logic  node,  a,  is  an  action,  and  nt,  n„ 

and  na  are  the  numbers  of  task  nodes,  logic  nodes  and  action  nodes,  respectively  (see  Fig. 
5).  Let  the  binary  representation  of  a  goal  state  x/,  e  S  be  an  /?, -dimensional  vector  t  =  [t\ 

. . .  tnt-i  t„t ],  whose  Ith  bit  is  1  or  0,  depending  on  whether  the  Ith  goal  has  been  successfully 

achieved  or  not.  However,  not  all  t:  ^"W;2M  <  2"'  -1  are  valid  goal  states.  For  example, 

the  state  x/,  =  [11010111]  =  235  is  not  valid  for  an  AGA  graph  in  Fig.  5,  because  to. 7 

cannot  be  1  if  either  t2. 4  or  t2. 5  are  unattained. 

The  validity  of  a  goal  state  x*  =  t  is  established  via  a  set  of  logical  functions  {g,i  (t_) } 
defined  in  [14].  The  logical  functions  {gti(t)}  are  as  follow:  gt2. 1=  t2. 1=  1;  ga.i  =  g/2.3  = 
gti.A =  gt2.5=  0  or  1;  gt2.6=  h.2  ©  t2. 3;  go..i =  G.4  •  G.5j  g/2.8  =  G. 6  +  G.7 •  Due  to  gf/(t) 
requirements,  the  cardinality  of  S,  «/,,  depends  largely  on  the  size  of  the  unconstrained 
goals  (e.g.,  too,  t2. 3,  t2. 4  and  t2. 5),  rather  than  nt.  Thus,  instead  of  28,  «/,  can  be  as  small  as 
17,  in  this  case.  Moreover,  the  absorbing  states,  i.e.,  the  set  of  valid  goal  states 
representing  the  desired  terminal  goal  states  (e.g.,  all  valid  goal  states  with  t2. 8=1),  can 
simply  be  absorbed  into  a  single  state.  This  reduces  «/,  even  further.  For  example,  a 
subset  of  such  states  is  highlighted  in  Table  2.  Consequently,  /?/,  is  reduced  from  the 
original  256  to  17.  The  list  of  all  valid  goal  states  (one  for  each  row)  of  tasks,  along  with 
the  tactical  attributes  (successfully  achieved,  not  achieved)  characterizing  them  is  shown 
in  Table  2.  For  example,  state  X3  =  [1  0  1  0  0  0  0  0]  corresponds  to  t2.\=  1,  t2.2=  0,  t2. 3=  1, 
to .4=  0,  G. 5=  0,  t2. 6=  1,  t2: 7=  0,  and  G.s=  0.  That  is,  the  successfully  achieved  tasks  of  this 
state  are  tasks  2.1  and  2.3. 

Action  set,  Y:  The  validity  of  the  control  action  Y  =  {a:  a}  e  gafa),  \fj  =  1,  ...,na},nu  = 
|Y|  is  established  via  a  set  of  functions  {gaj(a)\  as  in  [14].  The  logical  functions  |g«/(«)} 
are  as  follow:  ga2.i=  ga2.2=  ga2.3=  0  or  1;  gfl2.4=  a2A;  gus  =  d21ffi22.  The  symbol  ‘  ’ 

(over  bar  denotes)  a  logical  complement  to  the  argument.  The  function  ga2.\  specifies  that 
a2. 4  is  only  allowed  if  a2.\  is  not.  Also,  the  function  ga2. 5  restricts  that  the  inclusion  of 
a2.s  necessitates  the  exclusion  of  a2.2  and  c/2.3.  The  list  of  all  valid  control  functions  are 
shown  in  Table  3.  Table  10  lists  all  reachable  goal  states  from  each  valid  goal  state  via 
an  application  of  a  feasible  control  action  [14]. 

State  transition  probabilities,  {II(x(A+l),  T(x)|  x(k),  iij(k))}:  given  current  state  x(k),  an 
action  ufk),  the  probability  of  {x(k+\),  T(x)}  being  the  next  state  at  decision  epoch  of 
holding  time  T(k)  at  the  OLC-TLC  layer  is  denoted  by  Fl(x(A:+l),  T(x)|  x(k),  ufk )): 

P  (x(yt  +  l),T  (k)  |  x(k),Uj(k))  =  P  (x(k  + 1)  |  T  (k),x(k),Uj(k)) F  (T  (k)  \  x(k),Uj(k))  (10) 
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where  II(x(&+l)|  T(x),  x(k),  Uj(k ))  denotes  the  state  transition  probability  given  current 
state  x(k),  an  action  Uj(k),  and  holding  time  T (7c)  at  the  OLC-TLC  layer.  Here  n(x(A:+l)| 
T(x),  x(k),  Uj(k ))  is  defined  as  the  task  success  probability,  based  on  the  asset  allocation 
and  task  execution  activities  at  the  tactical  level. 

Reward  (Cost)  structure,  {P(T(A)|  x(k),  Uj(k ))}:  The  reward  P(T(A)|  x(k),  Uj{k ))  of  an 
action  Uj(k )  is  represented  by  the  intensity  score  function  which  the  probability  of 
answering  correctly  a  particular  response  category  [15].  The  intensity  score  function 
defined  as  the  cumulative  form  of  the  logistics  function: 


R  (T  (k)  |  x{k),Uj(k))  =  1  /  [1  +  expH>*(R  (x(k),Uj(k))  -  r(T  (*))))].  (1 1) 

where  b'T  is  the  mission  weight  of  the  policy  (action)  n  at  the  SLC-OLC  layer,  P(x(/c), 
uj(k ))  is  estimated  reward  of  mission  difficulty  in  terms  of  resource  requirements  of  tasks, 
and  r(T(k))  is  the  accrued  reward  of  assigning  resources  to  tasks  is  calculated  in  terms  of 
excess  resource  allocation. 

The  value  function  of  an  initial  state  x  =  x(0),  for  policy  /ris  written  as: 


V*  (x)  =  E* [£  pk R  (T  (k)  |  x(k),  Uj  (k))  +  /?*R(T  (k)  |  x(X))] 


k= 0 


(12) 


where  /?  is  discount  rate  and  K  is  the  number  of  decision  epochs  at  the  OLC-TLC  layer. 


Y(x) 

{H(&+1)|  £(k),  Y(x)} 

Y(x) 

{(S(AH-l)l  m,Y(x)} 

m 

Y(x) 

{{S(A:+1)|  H(^),Y(x)} 

Xi 

U\ 

Xu  x2 

x3 

u  10 

X3,  Xu,  X17 

X9 

u2 

X9,  X13 

u2 

Xu  %5 

U\i 

X3,  X7,  Xu, 

Xll 

1/3 

X9,  X13,  X14 

u3 

Xu  x2,  x5,  x6 

X15,  X17 

Xio 

u2 

Xio,  X14 

u4 

Xu  Xq 

x4 

u2 

x4,  X8 

Xll 

u2 

Xu,  X15 

Us 

Xi,  x2,  x9,  Xio 

U4 

X4,  X12 

Us 

Xll,  X17 

u6 

Xi,  X5,  X9,  X 13 

u6 

X4,  X8,  X12,  Xi6 

Ug 

Xll,  X15,  X17 

u2 

Xu  x2,  X5,  X6,  X9,  Xio,  Xi3,  X 14 

x5 

U\ 

X5,  X6 

X12 

U2 

Xl2,  ^16 

x2 

u2 

x2,  X6 

u4 

X5,  X13 

X13 

U\ 

X13,  X14 

u4 

X2,  Xio 

Us 

X5,  X6,  X 13,  X 14 

u  12 

X13,  X17 

u6 

X2,  X5,  Xio,  X 14 

X6 

U4 

X6,  X 14 

U\3 

X13,  X14,  X17 

x3 

u2 

X3,  X7 

x7 

U4 

X7,  X15 

X14 

M 12 

Xl4,  X\1 

u4 

x3,  X11 

Us 

X7,  X17 

X15 

Us 

X15,  X17 

u6 

X3,  X7,  Xu,  X15 

u  10 

X7,  X15,  X17 

Un 

X15,  ^17 

l/8 

x3,  X17 

X8 

U4 

Xg,  Xi6 

U\4 

X15,  X17 

u9 

X3,  X7,  X 17 

x9 

U\ 

x9,  Xio 

Xl6 

un 

Xl6,  X17 

Table  10.  {S(&+1)|  S (k),  Y(x)}  reachability. 
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Introduction:  Motivation  and  Objectives 

Multi-Level  C2  Coordination  Modeling  Framework 

■  Strategic  Level  Control  (SLC) 

■  Operational  Level  Control  (OLC) 

■  Tactical  Level  Control  (TLC) 

Two  Coordinating  Decision  Layers  to  Explore  Linkages  with 
Strategic  and  Tactical  Levels  at  the  Operational  Level 

■  Strategic-Operational  (SLC-OLC)  Interface  Layer 

■  Operational-Tactical  (OLC-TLC)  Interface  Layer 

Application  to  a  Multi-mission  Scenario 

■  Major  combat  operations 

■  Humanitarian  assistance  and  disaster  relief  (HA/DR) 
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Introduction 
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Motivation 

■  Maritime  Headquarters  with  Maritime 
Operations  Center  (MHQ/MOC) 

motivated  by  identified  C2  gaps  in  recent 
national-level  crises,  e.g.,  September 
11,  operation  Iraqi  freedom  (OIF),  and 
humanitarian  assistance  and  disaster 
relief  (HA/DR)  during  Katrina 


■  MHQ/MOC*  is  the  Navy’s  new  concept 
at  the  operational  level  with  the 
capability  to  assess,  plan,  and  execute 
multiple  missions 


*  MHQ/MOC 


Policy  Level 


Strategic  Level 


Operational  Level 


Tactical  Level 


Naval  forces 


Other  assigned  or 
attached  forces 


■  Objectives:  provides  multi-level 
adaptive  C2  organizational  solution 
linking  tactical,  operational  and 
strategic  levels  of  MHQ/MOC  for 
assessing ,  planning  and  executing 
multiple  missions 


**  Range  of  military  operations 


Increasing  scale  &  complexity 


Normal  &  Routine 
Operations 

HA/DR 

Stab  Ops 

Major 

Combat 

Operations 
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Multi-level  C2  Coordination  Modeling  Framework 

Strategic  Level:  Decides  on  the  goals  of  overall  mission  ( commander’s 
intent) 

Operational  Level:  Estimates  the  task-resource  requirements,  allocates 
assets  to  tasks  under  strategic  guidance,  and  monitors  mission  progress 


m  Tactical  Level:  Sequences  and  executes  tasks 
Strategic  ~  A  ...  ^ 


tegic 
Level 


Set  Mission  Goals 


Desired  {oi  desired\ 

Outcomes 


Operational 

Level 


Mismatch 


FOPS 

(Plan) 


Allocation,  &  Re-allocation,  {<»,} 
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i, cap 

Asset  State 
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(Monitor) 
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i.rea  ) 
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Task  State 


Measurement 


Tactical 

Level 


Sequence  & 
Execute 


{a„r,} 


Task 

Outcomes 
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Battlefield 


Key  question 
addressed:  How 
to  solve  multi-level 
coordination 
problem  in  a 
distributed  way? 
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Multi-level  C2  Coordination  Process 

wo  coordination  layers 


j 

j 

j 


■  SLC-OLC  layer:  Selects  DIME  (diplomatic,  information,  military  and  economic)  action 
strategies  by  solving  a  semi-Markov  decision  problem  (SMDP) 

■  OLC-TLC  layer:  Plans  courses  of  action  for  individual  missions  by  solving  mission- 
specific  SMDPs 

Coordination  Process:  1)  Mission  weights  are  transmitted  by  the  SLC-OLC  layer  to 
the  OLC-TLC  layer  as  the  commander’s  intent  -»  2)  Each  mission  planner  at  the  OLC- 
TLC  layer  decides  on  state-dependent  action  path  in  the  mission  graph  3)  The 
minimum  mission  completion  time  is  transmitted  to  the  SLC-OLC  layer,  which  specifies 
the  decision  epoch  of  SMDP  at  the  SLC-OLC  layer 


Strategic  Level  (Combatant  Commander) 


Operational  Level  (MOC  Commander) 


START 


0> 


Diplomatic 


Information 


Mission 

Weights, 


END 
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Military 
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^^Times 


Task-asset 

status 
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SMDP:  SLC  - 
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Key  Issue:  DIME  action  sequencing 
to  achieve  the  desired  effects 

Approach:  Formulated  as  a  semi- 
Markov  decision  problem  (SMDP) 

■  State:  Combination  of  missions 


OLC  Laver 


■  Overall  Transition  probability:  The 

probability  of  mission  completion  over  the 
holding  (state  occupancy)  time  7Y£) 

P(x(k  + 1),  T  ( k )  |  x(k ),  Uj  ( k )) 

=  P(x(k  + 1)  |  T (k), x(k), Uj (k))F(T (k)  \  x(k),Uj(k)) 


■  Action:  State-based  DIME  action-paths 

■  Policy:  Best  action  to  take  in  each  state 
at  each  decision  epoch 


■  Reward  Structure:  DIME  action  cost  and 
reward  for  mission  completion  over  the 
holding  time  T(k) 

R(T(k)  |  x(k),Uj(k ))  =  ^R(zt(k),Uj(k))  +  V*(x) 

i= 1 

V;T(x) :  the  expected  total  reward  of 

an  option  (path)  at  the  OLC  -  TLC  layer 


■  The  expected  reward  of  policy  starting 
at  an  initial  state  x(0): 

F'(x(0))  =  |  x(k),u.(k)) 

k= 0 

+pkR(T{K)  |  x(K))] 

K :  the  number  of  decision  epochs 
/? :  discount  rate 
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SMDP:  OLC  - 


■  Key  Issue:  Find  optimal  path  in  a 
mission  graph  to  complete  the 
mission 

■  Approach:  Formulated  as  distributed 
semi-Markov  decision  problem  using 

an  mission  graph  (Meirina  etal.  IEEE  T- 
SMCA,  2008) 

■  State:  Combination  of  sub-goal  (task) 
states 


■  Action:  State-based  goal  (task)  actions 
(options) 

■  Policy:  Best  action  to  take  in  each  state 
at  each  decision  epoch 


TLC  Laver 


■  Transition  probability:  Probability  of 
completing  a  mission  via  a  path  in  the 
mission  graph  over  the  holding  time  T(A) 

P  ( x(k  +  1),T  ( k )  |  x{k),Uj(k )) 

=  P  (x(k  + 1)  |  T  (k),x(k),Uj(k)) F  (T  (k)  \  x(k),Uj{k)) 


■  Reward  Structure:  Function  of  mission 
difficulty  and  task  accuracy  for  the 
assigned  resources 

R  (T  ( k )  |  x(k),Uj(k ))  -  R  ( x{k),uj{k ))  +  r(T  (&)). 

r(T  (&)) :  the  reward  over  the  holding  time  T  ( k ) 


■  Expected  reward  of  a  policy  starting 
at  initial  state  x(0): 

V'(*(0))  =  (T  ( k )  |  x(k),Uj(k)) 

k= 0 

+/?‘R  (T  (K )  |  *(/0)] 

K :  the  number  of  decision  epochs 
P :  discount  rate 
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Application  to  Multi-mission 


Mission  Space 


■  Mission  1 :  capture  a  seaport  to  allow 
the  introduction  of  follow-on  forces 
(major  combat  operation) 

■  Mission  2:  rescue  activity  after  a 
hurricane  in  the  homeland  (HA/DR) 


■  SLC-OLC  Layer 

a  Solve  the  SMDP  problem  at  the  SLC-OLC  layer  =>  DIME  action  policy. 

■  Transmit  mission  weights  to  the  OLC-TLC  layer 


7 


J 

J 

J 


Application  to  Multi-mission 


The  OLC-TLC  layer 

•  The  take-asset  assignment  results  provided  to  the  SMDPs  at  the  OLC-TLC 
layer  from  TLC. 

0  Each  OLC-TLC  layer  mission  planner  decides  on  state-dependent  action  path 
in  the  mission  graph 

a  Transmit  the  mission  completion  time  to  the  SLC-OLC  layer. 

State  15 
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Summary  and  Future  Work 


Multi-level  Operational  C2  Holonic  Reference  Architecture 

It  can  be  applied  to  the  Navy’s  new  MHQ  with  MOC  linking  tactical,  operational 
and  strategic  level  controls 

•  Strategic  Level  Control  (SLC):  centralized  assessment 

•  Operational  Level  Control  (OLC):  networked  distributed  planning 

•  Tactical  level  control  (TLC):  decentralized  execution 

C2  coordination  issues  at  the  three  levels,  associated  with  DIME  actions  (future 
plans),  and  mission  planning  (future  operations  and  current  operations)  can  be 
modeled  using  SMDP  formalisms  within  the  proposed  holonic  architecture 

The  two  layers  share  the  outcomes  of  SMDP  solutions  at  each  layer  (e.g., 
missions  to  be  planned  from  SLC-OLC  — ►  OLC-TLC,  mission  completion  times 
from  OLC-TLC  — ►  SLC-OLC)  to  reach  consensus 

Future  Work 

■  Game-theoretic  incentive  mechanisms  to  induce  collaborative  behavior 

■  Distributed  auction  algorithms  with  partial  information  to  decide  on  the 
best  organizational  structures 
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Hold ing  Time  Distribution  Function 


where 

N :  Number  of  missions 

col  :  mission  termination  condition 

z. :  Status  of  mission  /,  z.  =1  denotes  the  presence  of  a  mission 
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